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Genome-wide analysis of genetic susceptibility to 
language impairment in an isolated Chilean population 

Pia Villanueva*' 1,2,3,5 , Dianne F Newbury*' 4 ' 5 , Lilian Jara 1 , Zulema De Barbieri 2 , Ghazala Mirza 4 , 
Hernan M Palomino 3 , Maria Angelica Fernandez 2 , Jean-Baptiste Cazier 4 , Anthony P Monaco 4 
and Hernan Palomino 1 

Specific language impairment (SLI) is an unexpected deficit in the acquisition of language skills and affects between 5 and 
8% of pre-school children. Despite its prevalence and high heritability, our understanding of the aetiology of this disorder is 
only emerging. In this paper, we apply genome-wide techniques to investigate an isolated Chilean population who exhibit an 
increased frequency of SLI. Loss of heterozygosity (LOH) mapping and parametric and non-parametric linkage analyses indicate 
that complex genetic factors are likely to underlie susceptibility to SLI in this population. Across all analyses performed, 
the most consistently implicated locus was on chromosome 7q. This locus achieved highly significant linkage under all three 
non-parametric models (max NPL=6.73, P=4.0x 10 11 ). In addition, it yielded a HLOD of 1.24 in the recessive parametric 
linkage analyses and contained a segment that was homozygous in two affected individuals. Further, investigation of this 
region identified a two-SNP haplotype that occurs at an increased frequency in language-impaired individuals (P=0.008). 
We hypothesise that the linkage regions identified here, in particular that on chromosome 7, may contain variants that 
underlie the high prevalence of SLI observed in this isolated population and may be of relevance to other populations 
affected by language impairments. 
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INTRODUCTION 

Specific language impairment (SLI) is a profound deficit in the 
acquisition of language despite adequate intelligence and opportunity, 
in the absence of any possible medical aetiology. 1 This disorder is a 
common developmental condition affecting between 5% and 8% of 
pre-school children, and thus places a heavy burden upon health- 
related and educational services. 2 It is well documented that SLI has a 
strong genetic basis (reviewed by Stromswold 3 ). However, it is 
proposed that susceptibility to this disorder is complex in nature 
involving multiple genes, in combination with environmental factors. 4 
The genetic basis of complex disorders are notoriously difficult to 
characterise, as the contributing factors can vary greatly between 
affected individuals and may be masked by undetermined environ- 
mental effects. This is reflected in the fact that, to date, only four 
genetic loci 5-7 and three associated candidate genes 8 ' 9 have been 
described for SLI (OMIM no. 606711 (SLI1), OMIM no. 606712 
(SLI2), OMIM no. 607134 (SLI3), OMIM no. 612514 (SLI4), OMIM 
no. 612514 (CNTNAP2, SLI4) OMIM no. 613082 (ATP2C2, in SLI1) 
and OMIM no. 610112 (CMIP in SLI1)). 

Isolated founder populations can provide an important resource in 
the identification of causal genes underlying complex disorders. 10 



Such populations are derived from a small number of relatively recent 
ancestors and thus are relatively homogeneous, a point which can 
greatly assist gene mapping processes. 11 Furthermore, one may pos- 
tulate that loci identified in founder populations may hold more 
relevance to the general population than those yielded by the study 
of rare monogenic forms of impairment. In 2008, Villanueva et al u 
described a Chilean founder population with an increased incidence of 
SLI (known as TEL in Spanish-speaking countries). This population 
inhabit the Robinson Crusoe Island, which forms part of the Juan 
Fernandez Archipelago, 677 km to the west of Chile, South America. 
Robinson Crusoe Island is the only inhabited island in the archipelago 
and has 633 residents. The most recent colonisation dates to the late 
nineteenth century when the island was repopulated by a group of 
eight families. A total of 77% of the current population has at least one 
of the colonising surnames supporting a high degree of consanguinity. 
Linguistic profiling of the colonising children indicated that 35% met 
current criteria for SLI (expressive or comprehensive language >2SD 
below that expected for their age), 27.5% had language deficits 
associated to other pathologies (eg, delayed psychomotor develop- 
ment, intellectual deficit or auditory impairment) and 37.5% 
displayed normal language skills. 12 In contrast, the frequency of SLI 
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in the non-colonising children (3.8%), coincided with that reported 
for mainland Chile (~4%). 13 Genealogical reconstruction indicated 
that 75% of known affected individuals were descended from a single 
pair of founder brothers. 12 This population therefore represents a rare 
resource, which may be valuable in the identification of genetic loci 
contributing to susceptibility to SLI. 

In this study, we perform genome-wide loss of heterozygosity 
mapping and parametric and non-parametric linkage analysis of the 
Robinson Crusoe population. We identify five regions (on chromo- 
somes 6, 7, 12, 13 and 17) that meet genome- wide significance, and 
several loci, which are consistently implicated across alternative 
analyses. We hypothesise that these regions may contain variants 
that underlie the high prevalence of SLI observed in this isolated 
population. 

SUBJECTS AND METHODS 

This work was approved by the ethics department of the University of Chile. 
Informed consent was given by all participants and/or, where applicable, their 
parents. 

DNA was extracted from EDTA whole blood samples collected from all available 
SLI and language-normal probands and their immediate families (125 individuals 
from 34 families, Table 1) using a standard chloroform extraction protocol. 

All Island inhabitants between 3 and 8 years, 1 1 months of age (n—66) were 
subjected to a linguistic battery, which included tests of phonology (Test para 
Evaluar Procesos de Simplificacion Fonologica (TEPROSIF) 14 ) and expressive 
and receptive morphosyntax (Toronto Spanish Grammar Exploratory test 15 ). 



Table 1 Sample structure 





N 


SLI (%) 


Language normal (%) 


Probands 


34 


12 (35) 


22 (65) 


Sibs 


22 


5 (23) 


17 (77) 


Half-sibs 


6 


4(67) 


2 (33) 


Parents 


61 


21 (34) 


40 (66) 


Total 


123 


42 (34) 


81 (66) 


Male 


55 


17 (39) 


38 (47) 


Female 


68 


25 (61) 


43 (53) 



Abbreviation: SLI, specific language impairment. 

A total of 123 samples were analysed. These included 42 language impaired individuals and 
81 language normal individuals. 

Percentage of SLI and language normal probands, sibs, half-sibs, parents and totals are given 

as a percentage of the total number of the appropriate group. 

Percentage of males and females are given as a percentage of the language group. 

Values in bold are the total number of samples. 



Any child who performed >2SD below that expected for their age was 
classified as having SLI. Exclusion criteria included non-verbal IQ (Columbia 
Mental Maturity Scale) below the 80th percentile, hearing disability, motor or 
structural abnormalities (Oral Motor and Speech Examination 16 ) and a 
co-morbid diagnosis of autism, emotional difficulties, or neurological disorders 
(as assessed by medical history). Following proband ascertainment, available 
family members were assessed for the presence of SLI. Individuals who fell 
outside the age-range of available standardised tests (3 and 8 years 11 months) 
were assessed through a family history interview 17 and tests of verbal fluency 
(Barcelona test 18 ), verbal comprehension (Token test 19 ), non-verbal intelligence 
(Raven's progressive matrices 20 ) and auditory screening. The identification and 
classification of probands formed part of the descriptive study by Villanueva 
et al (2008). As this previous manuscript was in Spanish, detailed assessment 
descriptions are provided as Supplementary Material. 

The present study considers only families derived from colonising families of 
the Robinson Crusoe Island (ie, at least one ancestor related to a founder 
member). 

Genotyping 

DNA was quantified by a pico-green assay (Quant-iT, http://www.invitrogen. 
com). In total, 125 samples were genotyped on the Illumina HumanLinkage-12 
panel following the multi-sample Infinium II assay (http://www.illumina.com). 
These beadchips allow the genotyping of 6090 genome-wide single nucleotide 
polymorphisms (SNPs) and simultaneously analyse 12 DNA samples. 

Quality control procedures 

All genotypes were called within Beadstudio (Version 3, Illumina Inc., San 
Diego, CA, USA). Any SNP with a gentrain score below 0.9 was manually 
inspected and if, necessary, the clusters adjusted. A total of 18 samples were 
duplicated across arrays. Any SNP with a gentrain score below 0.5 (n=27)> a 
call rate below 0.97 (n=4) or a minor allele frequency below 2.5% (n=2) was 
excluded from further analyses. 

All called genotypes were subjected to a haplotypic error detection algorithm 
in MERLIN. 21 All identified unlikely genotypes (P< 0.001) were re-examined 
and, if necessary, excluded. Probabilities of Hardy-Weinberg Equilibrium 
(HWE) were calculated within PEDSTATS 22 and any SNP with a HWE-p 
< 0.001 (2 of 5666 SNPs examined) was identified for cautious treatment in the 
remaining analyses. 

Allele -sharing between individuals was examined using the Graphical 
Representation of Relationships (GRR). 23 This software calculates mean 
Identity by State (IBS) values for all possible pairs of samples and clusters 
individuals accordingly. Any individual found to cluster outside the expected 
IBS values were further examined. This error checking stage identified two 
DNA samples that had been mislabelled and were therefore excluded. 

Generation of linkage pedigrees 

Genealogical information was collated from birth and marriage certificates, 
family names and parent and relative interviews. Known relationships between 
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Figure 2 Small pedigrees used for linkage analyses. The larger pedigree shown in Figure 1 was broken into seven smaller pedigrees of maximum size 24 bits 
for linkage analyses. 



identified nuclear families and the relevant pair of founder brothers were 
reconstructed and examined within the Progeny software (www.progenygen 
etics.com) (Figure 1). 

Homozygosity mapping 

Genotype data from all affected individuals were analysed for loss of hetero- 
zygosity within PLINK. 24 Sliding windows of 20-SNP genotypes were examined 
for runs of homozygosity. In all, 42 affected individuals from 23 nuclear 
families were examined including 2 affected sib-pairs, an affected trio of siblings 
and 3 affected half- sib-pairs. Previous studies have found that runs of 
homozygosity <4Mb are common in outbred individuals. 25 Segments were 
therefore defined as homozygous tracts if 10 homozygous SNPs were found to 
extend across a region greater than 4 Mb in size. 

Linkage analyses 

Genotype data were stored in the Integrated Genotyping System (IGS) 
database. Individuals were classified as affected or unaffected on the basis of 
linguistic testing as described in 'subjects'. Data were analysed for linkage within 
MERLIN (autosomes) and the MERLIN extension, MINX (X chromosome). 21 
As linkage packages were unable to analyse genome-wide data for the 242 -bit 
pedigree as a whole, it was broken into sub-pedigrees. This segmentation was 
manually performed on the basis of closest shared ancestor. Seven extended 
families of 20-24 bits (where a bit is defined as 2xthe number of non- 
founders— the number of founders) were defined and included 41 affected 
individuals and 63 of the 123 genotyped individuals (Figure 2). Although some 
individuals were present across multiple sub-pedigrees, all affected-relative 
pairs were only represented once. Genotype data for unaffected individuals 
were used for haplotype analyses (described below). 

Parametric linkage analyses were performed under dominant and recessive 
models of linkage assuming a disease frequency of 35% (as described in the 
Robinson Crusoe population) and full penetrance. As the model of inheritance 
for SLI is unknown (and not expected to be monogenic in most instances) we 
also performed non-parametric analyses. Although explicit input parameters 
are not necessary for the completion of non-parametric analyses, expected allele 
frequencies must be specified. In this study, because of the isolated nature of the 
population, we had no directly appropriate control data and therefore 
performed three non-parametric analyses using alternative allele frequency 
estimation strategies. First, we used allele frequencies of all genotyped 



individuals (w=123). These individuals are derived directly from the popula- 
tion under study and can therefore be expected to provide representative 
expected allele frequencies. Nonetheless, these data are derived from related 
individuals and can therefore lead to a bias. We therefore repeated the analyses 
using allele frequency data from genotyped founder individuals of the gener- 
ated sub-pedigrees (ie, those who marry into the pedigree, n=9). Although this 
reduces the dependence between individuals, it relies upon a small number of 
data points. We therefore also performed linkage analysis using allele frequency 
data from 60 unrelated CEPH individuals. The Y chromosome SNP data of the 
Robinson Crusoe population indicated that the founder males were European 
in origin (data not shown). Non-parametric results are reported as NPL scores 
and threshold levels for genome-wide significance are in line with that 
suggested by Kruglyak and Lander. 26 Namely, NPL scores of >3.8 
(P=7.4xl0 -4 ) are described as suggestive linkage, NPL scores >4.08 
(P=2.2xl0~ 5 ) as significant and NPL scores >4.99 (P=3.0xl0~ 7 ) as highly 
significant. Using a Bonferroni multiple testing correction for the three non- 
parametric analyses run, these thresholds equate to P=2.46xl0 -4 , 
P=7.3xl0 -6 and P=1.0xl0 -7 , respectively. In this instance, we expect the 
Bonferroni correction to be over- conservative because of the high-expected 
correlation between the three analyses. 

Haplotype analyses 

Haplotypes were reconstructed for the chromosome 7 region of linkage within 
nuclear 2-generation families using MERLIN. 21 Two-SNP sliding windows were 
visually inspected for allele combinations that co-segregated with affection 
status. All haplotypes that were found to have odds ratios of >2.0 or <0.5 
(n=5) were analysed for association within PLINK using all genotyped cases 
and controls under a linear model. 24 In these analyses, no correction was made 
for the relationships between cases and controls. Association analyses of 
simulated data-sets yielded a distribution of empirical P-values that fit well 
with those expected under the theoretical model indicating that, in this 
particular case, the relationships between individuals do not inflate the 
significance of the results obtained (data not shown). Measures of linkage 
disequilibrium (LD) were calculated within haploview. 27 

RESULTS 

Pedigree reconstructions confirmed that of the 44 affected individuals 
from whom we had DNA, 37 (84%) were descendants of a pair of 
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Table 2 Homozygous segments shared between more than two affected individuals 



Chromosome 


Start 


End 


Size 


Number of SNPs 


Number of Inds 


Homozygous individuals 


2 


169 542 195 


173 937 368 


4395173 


9 


2 


relationship unknown 


4 


73 731890 


78 761621 


5 029 731 


11 


2 


relationship unknown 


6 


71779 542 


77 471874 


5 692 332 


15 


3 


unrelated 


6 


77 572 235 


77 572 235 


0 


1 


2 




6 


87 364428 


87 532 681 


168 253 


3 


2 




6 


88115 604 


92 044 752 


3 929148 


17 


3 


unrelated 


6 


92 098 625 


93 639 259 


1540 634 


9 


2 




7 


108 674847 


114462759 


5787912 


13 


2 


relationship unknown 


8 


18755221 


19196 467 


441 246 


3 


2 




8 


19559214 


23 746 576 


4187 362 


16 


3 


1 sib pair and 1 unrelated 


9 


83 685 047 


93 408 941 


9 723 894 


20 


2 


relationship unknown 


10 


3791413 


9111974 


5320 561 


22 


2 


relationship unknown 


11 


54867814 


54867814 


0 


1 


3 




11 


55 360 988 


59 674 738 


4313750 


12 


4 


1 sib pair and 2 unrelated 


11 


59 957 022 


60616462 


659 440 


2 


2 




13 


41575 238 


46 431276 


4856 038 


14 


2 


relationship unknown 


14 


20899 244 


24 950428 


4051 184 


14 


2 


unrelated 


14 


94718410 


98165 036 


3 446 626 


8 


2 




14 


98 298832 


99813 174 


1514342 


8 


3 


unrelated 


14 


100 345 436 


101474494 


1 129 058 


4 


2 




15 


36 837 208 


36 837 208 


0 


1 


2 




15 


37016395 


37 119 086 


102 691 


5 


3 




1 0 


o "7 q i o cnc 
O/olo DUO 


/I "2 /I "7/1 C^/1Q 

4o 4/4 04o 


b i oo y4o 


lb 


4 


1 sib pair and 2 unrelated 


15 


81012306 


88150 562 


7 138 256 


20 


2 


relationship unknown 


16 


15 723 647 


19953 169 


4 229 522 


18 


2 


unrelated 


19 


50 678 730 


57 149118 


6 470 388 


19 


2 


relationship unknown 


20 


52 260 700 


58384823 


6124123 


18 


2 


relationship unknown 


21 


32 754 546 


35 223 308 


2 468 762 


5 


2 




21 


35 233 892 


38156 688 


2 922 796 


18 


3 


unrelated 


21 


38 599 459 


39 524326 


924867 


3 


2 





Abbreviation: SNPs, single nucleotide polymorphisms. 

Start and end positions give positions of the extremities of overlapping segments between all individuals (in bp, B36). Boxed segments are contiguous. 



founder brothers (Figure 1), 3 (7%) were not related to the founder 
brothers and 4 (9%) had unknown ancestry. Following quality control, 
genotypes were available for 6009 SNPs (5666 autosomal) with an 
average spacing of one SNP every 490 kb. The average genotype call 
rate was 99.9%. The minimum SNP genotype rate was 94.3% and the 
minimum SNP heterozygosity was 4%. Two individuals (both 
affected) were excluded from the analyses yielding genotype data for 
123 individuals with an average individual genotype rate of 99.9% and 
a minimum individual genotype rate of 99.2%. The genotype mis- 
match rate across duplicated samples was 0.0027% and two SNPs were 
found to have a Hardy-Weinberg P- value of < 0.001. 

Of the 42 affected individuals examined, 28 showed at least one 
tract of homozygosity. Across all affected individuals, an average of 
13.1Mb (median, 5.4 kb) of the genome consisted of homozygous 
tracts. In individuals whose parents were known to be first or second 
cousins (n=6), this figure increased to 26.3 Mb (median, 28.9 kb). No 
chromosome region was found to be homozygous in all affected 
individuals, but two chromosome regions were homozygous in four 
(10%) affected individuals. These comprised of a 4Mb region of 
chromosome 1 1 and a 6 Mb region of chromosome 15, both of which 
were homozygous in a sib-pair and two additional unrelated indivi- 
duals (Table 2). In total, 18 chromosome regions contained over- 
lapping segments of homozygosity (Table 2). 

No chromosome region reached parametric genome-wide signifi- 
cance (HLOD>3, Figure 3). Maximum HLODs were observed on 



chromosome 8 for the dominant model (rs 1390950, HLOD=2.4, 
Figure 3) and chromosome 1 for the recessive model (rs 1906255, 
HLOD=1.52, Figure 3). Under the recessive model, chromosome 
15 gave HLOD scores marginally above 1 (maximum HLOD=1.05) 
in a region that was homozygous in four affected individuals 
(Table 3). 

Non- parametric linkage analyses identified five chromosome 
regions (chromosomes 6q, 7, 12, 13 and 17) that reached the threshold 
for genome-wide significance (NPL>4.08, P<2.2xl0~ 5 ). Three of 
which (chromosomes 6q, 7 and 12) were highly significant 
(NPL>4.99, P<3.0xl0" 7 ) (Table 3, Figure 3), even after an over- 
conservative Bonferroni correction for the three non-parametric tests 
performed. The linkages to chromosomes 6q and 12 were only 
observed in a single non-parametic analysis whereas those to chromo- 
somes 7, 13 and 17 were consistent across all non-parametric analyses 
performed (Figure 3, Table 3). 

The most consistently implicated region was on chromosome 7. 
This locus achieved highly significant linkage under all three non- 
parametric models (max NPL=6.73, P=4.0xl0 -11 ) and contained a 
region, which gave a HLOD of 1 .24 in the recessive parametric linkage 
analyses and a segment that was found to be homozygous in two 
affected individuals (Table 3, Figure 4). Linkage analyses within each 
of the sub-pedigrees, revealed that four families were contributing to 
linkage at this locus (linkage peds 3, 5, 6 and 7 (Figure 2), data not 
shown). Segregation analyses of two-SNP sliding window across this 



European Journal of Human Genetics 



Genome analysis of SLI in a Chilean isolate 

P Villanueva et al 



691 




1500 

Cumulative Chr Posn (Mbp) 

Figure 3 Genome-wide linkage analyses. Traces are shown for parametric analyses using both dominant and recessive models with full penetrance and three 
non-parametric models utilising expected allele frequencies derived from CEPH population, from genotyped founders in the sub-pedigrees and from all 
genotyped individuals. Traces are also shown for identified stretches of homozygosity (where the X-axis represents the number of individuals found to be 
homozygous across the region). 



region identified five 2-SNP combinations that were present in at least 
90% of affected individuals. Further investigations in all genotyped 
individuals, indicated that one of these haplotypes (rs727714/ 
rs969356, AG) occurred at a significantly lower frequency in un- 
affected individuals than affected (Supplementary Table 1). The AG 
genotype of the rs727714/rs969356 haplotype was present in 98% of 
cases and 76% of controls and had an allele frequency of 67% in cases 
and 48% in controls (P=0.008). This association remains marginally 
significant (P=0.04) after the application of a Bonferroni correction. 
This haplotype covered 74 kb of sequence and coincided with the non- 
parametric (All) peak of linkage. It lay 2.5 Mb proximal to the SNPs 
with the highest NPL score in the two alternative non-parametric 
analyses (rsl524341 and rsl024676, D'=0.21-0.23, Tables 3 and 4) 
and was ~ 2.6 Mb distal to the region of parametric linkage and 3 Mb 
proximal to a segment of homozygosity. Investigation of the LD 
structure indicated that the rs727714/rs969356 haplotype showed 
moderate (D'>0.4 and LOD>2) long-range LD with surrounding 
variants (Table 4), which may provide an alternative explanation for 
the association observed. One of the two haplotype SNPs (rs727714) 
falls in exon 3 of the NOBOX gene creating a synonymous base 
substitution. 

As expected, given the density of the panel used in this study, single 
SNP association across the entire region of linkage on chromosome 7 
did not identify any significant associations (minP across linkage 
region=0.02, Figure 4). As single SNPs, rs727714 and rs969356 yielded 
association P- values of 0.04 and 0.13, respectively. 

DISCUSSION 

In this paper, we perform genome- wide analyses of an isolated Chilean 
population affected by Specific Language Impairment (SLI). Homo- 
zygosity mapping and parametric linkage analyses did not identify any 
chromosome segments that co-segregate with SLI in this population, 
suggesting that a completely penetrant monogenic aetiology is un- 
likely. This hypothesis is further supported by the observed nature of 
the language impairments. Affected individuals do not present with a 



specific core phenotype as may be predicted under a monogenic 
model, but instead show extensive heterogeneity in the severity and 
nature of impairment between affected individuals, as is typical of 
complex genetic forms of SLI. 

The most consistent region of linkage extended across 48 Mb of 
chromosome 7q (chromosome position 111285 062-158 710 965). 
This region reached a maximum NPL score of 6.73 (P=4.0x 10 -11 ) 
and achieved genome-wide significance in all three non-parametric 
analyses performed and overlapped with a peak of parametric 
linkage (recessive model max HLOD=1.24) and two segments of 
homozygosity. Although these are not independent observations 
and a number of alternative analyses were performed, the reliability 
of the linkage in this region is consistent with that expected from a 
true positive. 

Segregation analyses identified a two-SNP haplotype that was found 
at a marginally increased frequency in cases than controls (P=0.008). 
This haplotype fell across the NOBOX (OMIM no. 610934) and TPK1 
(OMIM no. 606370) genes. NOBOX is a homeobox gene, which is 
preferentially expressed in oocytes, but not reported to be expressed in 
brain. 28 TPK1 encodes the thiamine pyrophosphokinase 1 enzyme, 
which catalyses the conversion of thiamine to thiamine pyro- 
phosphate. Thiamine (or vitamin Bl) is essential for the metabolism 
of carbohydrates into glucose and acts as a co-enzyme in the produc- 
tion of acetylcholine. Thiamine deficiency forms part of numerous 
disorders including ataxia, confusion and impaired memory. 29 Inter- 
estingly, a recent study suggested a link between thiamine deficiency 
and syntactic and lexical disorder. 30 The chromosome 7 peak also 
overlaps with the AUTS1 locus of linkage to autism 31 and includes 
both the FOXP2 and CNTNAP2 genes, both of which have previously 
been associated with language disorders. 9 ' 32 The genotyping panels 
utilised in this study were optimised for linkage investigations and 
thus involve a relatively sparse map of SNPs (~ 1SNP every 500 kb). 
The fine mapping of these regions is therefore required to enable the 
identification of candidates in an unbiased manner. We found that the 
two-SNP haplotype on chromosome 7 showed moderate long-range 
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linkage disequilibrium with a number of SNPs indicating that further 
information would be required to narrow the linkage peak. Higher 
density SNP arrays would also enable the detection of smaller runs of 
homozygosity. 

We did not observe any linkage to chromosomes 16 or 19, which 
have previously been implicated in SLI. 5 ' 6 ' 33 Again, this may be caused 
by the low density of markers investigated in the present study. 
Alternatively, as the loci on chromosome 16 and 19 were identified 
by a quantitative genome screen of language-related measures, this 
may reflect differences in study design. As the Chilean quantitative 
linguistic data was collected only for subjects within a restricted age 
range (3 and 9 years), the current study utilised a binary affection 
status. This is similar to the approach applied by Bartlett et al (2002, 
2004) in their genome screen for SLI in which they identified a region 
of linkage on chromosome 13 (SLI3), which overlaps with that found 
by the present study. This region has also been linked to autism, 34 a 
result which was strengthened by the selection of families on the basis 
of linguistic data. 35 Our chromosome 13 linkage consisted of two 
adjacent peaks. The distal peak (34^8 Mb) overlapped with a segment 
of homozygosity and achieved a maximum NPL score of 4.8 
(P=8.0xl0~ 7 ) using CEPH allele frequencies. The proximal peak 
(83-94 Mb) reached an NPL of 3.5 (P=0.0002) under all non- 
parametric analyses performed and coincided with an area of marginal 
linkage under a recessive parametric model. 

In addition to the linkages on chromosome 7 and 13, we also 
observed significant linkage (NPL > 4.08 (P<2.2x 10~ 5 )) to chromo- 
some 17 and highly significant linkages (NPL>4.99 (P< 3.0x 10" 7 )) 
to chromosomes 6q and 12 (Figure 3, Table 3). However, these peaks 
were only observed under a single non-parametric model and not in 
models using alternative expected allele frequencies. It is therefore 
likely that these divergent results may be driven by differences in the 
allele frequencies of the control populations used and illustrate the 
importance of correctly estimating allele frequencies, especially for 
markers that are in linkage disequilibrium. 36 Indeed, we found that 
the correlation of expected allele frequencies between the three 
different control groups was moderate (0.41-0.70 across all SNPs) 
and was lower than average across the conflicting regions of linkage 
on chromosome 6 and 12 (as low as 0.29 and 0.09, respectively), but 
remained moderate across the region of linkage on chromosome 7 
(0.48-0.67). Importantly, simulation studies indicate that although 
allele frequency misspecification can lead to false positives, this artefact 
is not expected to affect the power to detect true linkages. 37 Thus, 
although the loci on chromosome 6 and 12 reached a threshold 
of highly significant linkage, as these were observed with only one 
non-parametric analysis, we must recognise the possibility that 
they represent false positives, especially given the high number of 
tests performed. Instead, a more fruitful avenue of investigation may 
be provided by the examination of regions found to be consistently 
implicated across all three analyses performed, even in cases where this 
linkage did not reach genome-wide significance (eg, chromosome 2, 
6p, 8, 9, 15 and 17. Table 3, Supplementary Figure 1). 

In conclusion, this study has applied a genome-wide approach to 
identify loci which may contain genes underlying susceptibility 
to SLI in an isolated population. This study represents the first step 
in the detection of genetic variants that underlie the increased 
frequency of language impairments in this population. It is envisaged 
that the fine mapping of the identified loci will allow the detection of 
associated polymorphisms. It is likely that the variants identified by 
the further study of this population will have a significant role 
in furthering our understanding of the genetic basis of language 
impairments and language development. 
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Figure 4 Chromosome 7. Chromosome 7 represented the most consistently linked locus across analyses. Traces are shown for parametric analyses using both 
dominant and recessive models with full penetrance, three non-parametric models utilising expected allele frequencies derived from CEPH population, from 
genotyped founders in the sub-pedigrees and from all genotyped individuals. Traces are also shown for identified stretches of homozygosity (where the y axis 
represents the number of individuals found to be homozygous across the region) and association of P-values (relative to the secondary y axis). 
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Abbreviations: Chr, chromosome; posn, position; SNP, single nucleotide polymorphism. 

Any SNP that has a D'>0.4 and a pairwise LOD>2.0 with the associated haplotype is shown. Measures of LD were evaluated both in nuclear families and in the extended pedigree as shown in 
Figure 1. 

The associated haplotype was formed from SNPs rs727714 and rs969356. These two SNPs gave the maximum NPL score of the non-parametric linkage analyses using allele frequencies from all 
genotyped individuals. The peak of linkage in the non-parametric analyses using allele frequencies from founder and CEPH individuals fell across SNPs rsl524341 whereas the peak of parametric 
linkage fell at SNPs rsl476640 and rs760885. 
All SNPs are intronic unless otherwise stated. 
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